Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning

نویسندگان

Sebastian Banescu

Christian S. Collberg

Alexander Pretschner

چکیده

Software obfuscation transforms code such that it is more difficult to reverse engineer. However, it is known that given enough resources, an attacker will successfully reverse engineer an obfuscated program. Therefore, an open challenge for software obfuscation is estimating the time an obfuscated program is able to withstand a given reverse engineering attack. This paper proposes a general framework for choosing the most relevant software features to estimate the effort of automated attacks. Our framework uses these software features to build regression models that can predict the resilience of different software protection transformations against automated attacks. To evaluate the effectiveness of our approach, we instantiate it in a case-study about predicting the time needed to deobfuscate a set of C programs, using an attack based on symbolic execution. To train regression models our system requires a large set of programs as input. We have therefore implemented a code generator that can generate large numbers of arbitrarily complex random C functions. Our results show that features such as the number of community structures in the graphrepresentation of symbolic path-constraints, are far more relevant for predicting deobfuscation time than other features generally used to measure the potency of controlflow obfuscation (e.g. cyclomatic complexity). Our best model is able to predict the number of seconds of symbolic execution-based deobfuscation attacks with over 90% accuracy for 80% of the programs in our dataset, which also includes several realistic hash functions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploit Dynamic Data Flows to Protect Software Against Semantic Attacks

Unauthorized code modification based on reverse engineering is a serious threat for software industry. Virtual machine based code obfuscation is emerging as a powerful technique for software protection. However, the current code obfuscation techniques are vulnerable under semantic attacks which use dynamic profiling to transform an obfuscated program to construct a simpler program that is funct...

متن کامل

HeNet: A Deep Learning Approach on Intel$^\circledR$ Processor Trace for Effective Exploit Detection

This paper presents HeNet, a hierarchical ensemble neural network, applied to classify hardware-generated control flow traces for malware detection. Deep learning-based malware detection has so far focused on analyzing executable files and runtime API calls. Static code analysis approaches face challenges due to obfuscated code and adversarial perturbations. Behavioral data collected during exe...

متن کامل

Linear Obfuscation to Combat Symbolic Execution

Trigger-based code (malicious in many cases, but not necessarily) only executes when specific inputs are received. Symbolic execution has been one of the most powerful techniques in discovering such malicious code and analyzing the trigger condition. We propose a novel automatic malware obfuscation technique to make analysis based on symbolic execution difficult. Unlike previously proposed tech...

متن کامل

BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking

Detecting differences between two binary executables (binary diffing), first derived from patch analysis, have been widely employed in various software security analysis tasks, such as software plagiarism detection and malware lineage inference. Especially when analyzing malware variants, pervasive code obfuscation techniques have driven recent work towards determining semantic similarity in sp...

متن کامل

Syntia: Synthesizing the Semantics of Obfuscated Code

Current state-of-the-art deobfuscation approaches operate on instruction traces and use a mixed approach of symbolic execution and taint analysis; two techniques that require precise analysis of the underlying code. However, recent research has shown that both techniques can easily be thwarted by specific transformations. As program synthesis can synthesize code of arbitrary code complexity, it...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning

نویسندگان

چکیده

منابع مشابه

Exploit Dynamic Data Flows to Protect Software Against Semantic Attacks

HeNet: A Deep Learning Approach on Intel$^\circledR$ Processor Trace for Effective Exploit Detection

Linear Obfuscation to Combat Symbolic Execution

BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking

Syntia: Synthesizing the Semantics of Obfuscated Code

عنوان ژورنال:

اشتراک گذاری